Denizen - A Better Way to Give Parking Tickets

The Use Case

A parking ticket is the last thing any driver who has parked in a city wants to find upon returning to their car. However, despite the frustration that tickets cause drivers, administering parking tickets is actually a beneficial for cities in a variety of ways. Cities that consistently enforce parking regulations by giving out parking tickets promote greater respect for parking rules by drivers, discourage future instances of illegal parking that may obstruct traffic or endanger bicyclists and pedestrians, and generate more dollars of parking ticket revenue that are critical for so many public services. For these reasons, cities should be interested in strategies that allow them to more efficiently administer parking tickets in order to maximize the social and financial benefits that are likely to result.

Most cities employ a workforce of parking enforcement officers that are responsible for identifying illegally parked vehicles and administering tickets. For this project, we set out to design a new approach to parking enforcement that would both increase the volume and reduce the costs of parking ticket administration for cities.

Our Solution

Our solution is an app called Denizen that allows cities to crowdsource their parking enforcement to ordinary citizens, and equips those citizens with the predictive power of big data. We want to allow anyone to give a parking ticket to vehicles that have parked illegally or for longer than they’ve paid at a meter. Our app helps users identify locations likely to have parking infractions, report individual vehicles, and be rewarded for doing so. We think there is a significant opportunity to get ordinary citizens more involved in government activities in ways that beneficial for both parties, and we think parking ticket enforcement is a great use case to start with. More information about our solution can also be found through our informational video.

How does the app work?

Step 1: Identify. Users will consult a heatmap in our app to identify the areas in their city most likely to have illegally parked cars at certain times of the day.

Step 2: Report. After finding a vehicle that is suspected to be parked illegally, or is in a metered space that a user wants to verify has paid, the user will use the app to take a picture of the license plate.

Step 3: Reward. The app will use the time and location of the photo to validate whether the vehicle is parked illegally or past its allowed time. If a ticket is warranted, the user would receive a percentage of the dollar amount of the ticket and reward points in the app. Users will be able to track their progress over time and compete against other users in their city to achieve a high ranking on the app’s leaderboard.

Market Size

How big is the market for parking tickets? Every year approximately $100MM of parking tickets are given in San Francisco alone. Across the largest 16 U.S. cities approximately $1.4Bn of parking tickets are given every year.

City Benefits

How does this benefit city governments? We estimate that an average $75 ticket costs San Francisco at least $16 in parking attendant salaries and benefits. We think our app could take a 10% fee and give users a 10% reward and still generate more net parking revenue for the city.

The Model

Overview

How does the app predict where parking infractions are likely to occur? The app uses a parking meter dataset to predict the place and time of the ending of parking sessions. Our analysis has shown that weekdays are very similar to one another and that the number of parking sessions has a single peak in the middle of each day.

The model incorporates various spatial features in order to improve the accuracy of its predictions. These features include distance to transit, neighborhood, time of day, and distance to bars and restaurants. Because of the size of the underlying dataset, a single week of data was used to inform predictions. This week was chosen because it had good weather and no holidays. For that reason, weather and holidays were not included as predictors.

Setup

This document provides a step-by-step overview of our model creation and analysis. This process can be followed to not only replicate our model for similar uses cases but also for a plethora of other diverse use cases. A similar process could be used for other parking related problems such as predicting parking demand to implement surge pricing, re-balancing bikeshare systems, or predicting transit station traffic.

It is important to note that this model provides a simplified version of what the finalized app would do. In order to create a more fully realized product would take much more time and processing power than this project allotted.

Begin by conducting an initial setup and loading in all the required libraries and functions. Load data for San Francisco geometry including county lines and neighborhood outlines. Load San Francisco census data including median income, median age, percentage of white population, travel time to work, number of commuters, means of public transit, and total public transit usage. Additionally, load San Francisco census tract geometry.

#setting working directory

setwd("C:/Users/Hannah/Documents/Penn/Fall 2020/CPLN-592/Assignments/Final Project/CPLN592_Final")

#load libraries

library(tidyverse)
library(sf)
library(lubridate)
library(tigris)
library(tidycensus)
library(viridis)
library(riem)
library(gridExtra)
library(knitr)
library(kableExtra)
library(RSocrata)
library(mapview)
library(FNN)
library(osmdata)

# load styles

mapTheme <- function(base_size = 12) {
  theme(
    text = element_text(family = "Helvetica", color = "black"),
    plot.title = element_text(family = "Helvetica", size = 16,colour = "black"),
    plot.subtitle=element_text(family = "Helvetica", face="italic", size = 12),
    plot.caption=element_text(hjust=0, size = 10),
    axis.ticks = element_blank(),
    panel.background = element_blank(),axis.title = element_blank(),
    axis.text = element_blank(),
    axis.title.x = element_blank(),
    axis.title.y = element_blank(),
    panel.grid.minor = element_blank(),
    panel.border = element_rect(colour = "black", fill=NA, size=2)
  )
}

plotTheme <- function(base_size = 12) {
  theme(
    text = element_text(family = "Helvetica", color = "black"),
    plot.title = element_text(family = "Helvetica", size = 16 ,colour = "black"),
    plot.subtitle = element_text(family = "Helvetica", face="italic"),
    plot.caption = element_text(family = "Helvetica", hjust=0),
    axis.ticks = element_blank(),
    panel.background = element_blank(),
    panel.grid.major = element_line("grey60", size = 0.1),
    panel.grid.minor = element_blank(),
    panel.border = element_rect(colour = "black", fill=NA, size=2),
    strip.background = element_rect(fill = "grey60", color = "white"),
    strip.text = element_text(family = "Helvetica", size=10),
    axis.title = element_text(family = "Helvetica", size=10),
    axis.text = element_text(size=8),
    plot.background = element_blank(),
    legend.background = element_blank(),
    legend.title = element_text(family = "Helvetica", colour = "black", face = "italic"),
    legend.text = element_text(family = "Helvetica", colour = "black", face = "italic"),
    strip.text.x = element_text(family = "Helvetica", size = 12)
  )
}

# Load Quantile break functions

qBr <- function(df, variable, rnd) {
  if (missing(rnd)) {
    as.character(quantile(round(df[[variable]],0),
                          c(.01,.2,.4,.6,.8), na.rm=T))
  } else if (rnd == FALSE | rnd == F) {
    as.character(formatC(quantile(df[[variable]]), digits = 3),
                 c(.01,.2,.4,.6,.8), na.rm=T)
  }
}

q5 <- function(variable) {as.factor(ntile(variable, 5))}

# Load hexadecimal color palette

palette6 <-c("#97CCC4","#92D9C0","#9AE5B3","#B0EF9F","#D0F588","#F9F871")
palette5 <- c("#eff3ff","#bdd7e7","#6baed6","#3182bd","#08519c")
palette4 <- c("#D2FBD4","#92BCAB","#527D82","#123F5A")
palette2 <- c("#97ccc4ff","#de564d")
palettea <-c("#97ccc4ff","#de564d")

# Load nn function

nn_function <- function(measureFrom,measureTo,k) {
  measureFrom_Matrix <- as.matrix(measureFrom)
  measureTo_Matrix <- as.matrix(measureTo)
  nn <-   
    get.knnx(measureTo, measureFrom, k)$nn.dist
  output <-
    as.data.frame(nn) %>%
    rownames_to_column(var = "thisPoint") %>%
    gather(points, point_distance, V1:ncol(.)) %>%
    arrange(as.numeric(thisPoint)) %>%
    group_by(thisPoint) %>%
    summarize(pointDistance = mean(point_distance)) %>%
    arrange(as.numeric(thisPoint)) %>% 
    dplyr::select(-thisPoint) %>%
    pull()
  
  return(output)  
}
#load san francisco county shapefile

sfCounty <-
  st_read("https://data.sfgov.org/api/geospatial/p5b7-5n3h?method=export&format=GeoJSON") %>% 
  st_union()
sfCounty <- st_transform(sfCounty, 'EPSG:6339')

#load neighborhood boundaries

neighborhoods <- 
  st_read("https://data.sfgov.org/api/geospatial/pty2-tcw4?method=export&format=GeoJSON") %>%
  dplyr::select(-link)
neighborhoods <-  st_transform(neighborhoods,'EPSG:6339') 

#load san francisco census tracts

sfCensus <- 
  get_acs(geography = "tract", 
          variables = c("B01003_001", "B19013_001", 
                        "B02001_002", "B08013_001",
                        "B08012_001", "B08301_001", 
                        "B08301_010", "B01002_001"), 
          year = 2018, 
          state = "CA", 
          geometry = TRUE, 
          county=c("San Francisco"),
          output = "wide") %>%
  rename(Total_Pop =  B01003_001E,
         Med_Inc = B19013_001E,
         Med_Age = B01002_001E,
         White_Pop = B02001_002E,
         Travel_Time = B08013_001E,
         Num_Commuters = B08012_001E,
         Means_of_Transport = B08301_001E,
         Total_Public_Trans = B08301_010E) %>%
  dplyr::select(Total_Pop, Med_Inc, White_Pop, Travel_Time,
                Means_of_Transport, Total_Public_Trans,
                Med_Age,
                GEOID, geometry) %>%
  mutate(Percent_White = White_Pop / Total_Pop,
         Mean_Commute_Time = Travel_Time / Total_Public_Trans,
         Percent_Taking_Public_Trans = Total_Public_Trans / Means_of_Transport)
sfTracts <- 
  sfCensus %>%
  as.data.frame() %>%
  distinct(GEOID, .keep_all = TRUE) %>%
  dplyr::select(GEOID, geometry) %>% 
  st_sf

Next, load in the data. We used SFMTA Parking Meter Detailed Revenue Transactions data from SFData. This dataset included every single parking meter transaction in San Francisco where where each row equals a single transaction by a single customer at a single meter in San Francisco. Because the dataset was so large, we downloaded parking meter usage data for only one work week - from April 30th to May 4th of 2018. We selected this week due to its consistent weather and lack of holidays. We then cleaned the data by selecting only relevant variables and filtering for only street parking. We next created time intervals for each paring session’s end time using the lubridate package. For the purposes of this project, we assumed that every ending parking session was equally likely to have expired. We finally joined the dataset to a file containing the geometry of every parking meter in San Francisco. Each parking meter was assigned to its relevant block.

# exlude off shore census tracts

`%nin%` = Negate(`%in%`)
sfTracts <- subset(sfTracts, GEOID %nin% c("06075980401","06075017902","06075017902","06075017902"))

#load parking meter data

ParkingMeters.dat <- st_read("SFMTA_Parking_Meter_Detailed_Revenue_Transactions.csv")

# filter for only on-street parking meters

ParkingMeters <- subset(ParkingMeters.dat, Lot %in% "street")

#select desired variables

ParkingMeters<- ParkingMeters%>%
  dplyr::select(POST_ID,STREET_BLOCK, SESSION_START_DT, SESSION_END_DT, GROSS_PAID_AMT)

# create intervals for session end time

ParkingMeters <- ParkingMeters %>%
  mutate(interval60 = floor_date(mdy_hm(SESSION_END_DT), unit = "hour"),
         interval15 = floor_date(mdy_hm(SESSION_END_DT), unit = "15 mins"),
         week = week(interval60),
         dotw = wday(interval60, label=TRUE))

# convert the string columns to dates

ParkingMeters$SESSION_START_DT <- as.POSIXct(ParkingMeters$SESSION_START_DT,
                                             format='%m/%d/%Y %H:%M')
ParkingMeters$SESSION_END_DT       <- as.POSIXct(ParkingMeters$SESSION_END_DT,
                                                 format='%m/%d/%Y %H:%M')
ParkingMeters$ParkTime <-(difftime(ParkingMeters$SESSION_START_DT,ParkingMeters$SESSION_END_DT,units="hours"))*(-1)

# create cost per parking session

ParkingMeters$ParkTimeMin <-(ParkingMeters$ParkTime)*60
ParkingMeters$ParkTimeMin2 <-as.numeric(ParkingMeters$ParkTimeMin)
ParkingMeters$GROSS_PAID_AMT2 <-as.numeric(ParkingMeters$GROSS_PAID_AMT)
ParkingMeters$ParkingRate <- ParkingMeters$GROSS_PAID_AMT2/ParkingMeters$ParkTimeMin2

#load parking meters shapefile

ParkingMeters.sf <- 
  st_read("Parking_Meters.csv")

#assign a block "station" to each parking meter  

ParkingMetersUnique <-
  ParkingMeters %>%
  dplyr::select(STREET_BLOCK, POST_ID)

ParkingMetersUnique <- ParkingMetersUnique[!duplicated(ParkingMetersUnique$STREET_BLOCK), ]
ParkingMetersUnique<-
  ParkingMetersUnique%>%
  rename(StationID =POST_ID)

ParkingMeters <- merge(ParkingMeters, ParkingMetersUnique, by='STREET_BLOCK')
ParkingMeters.sf<-
  ParkingMeters.sf%>%
  rename(StationID=POST_ID)
ParkingMeters.sf <- merge(ParkingMeters, ParkingMeters.sf, by='StationID')
ParkingMeters.sf$LONGITUDE2 <-ParkingMeters.sf$LONGITUDE
ParkingMeters.sf$LATITUDE2 <- ParkingMeters.sf$LATITUDE

#combine parking meters.csv to parking meters shapefile

ParkingMeters.sf <- ParkingMeters.sf[!(ParkingMeters.sf$LATITUDE == ""), ]
ParkingMeters.sf <- ParkingMeters.sf[!(ParkingMeters.sf$LONGITUDE == ""), ]
ParkingMeters.sf <- st_as_sf(x = ParkingMeters.sf, coords = c("LONGITUDE","LATITUDE"),crs = "+proj=longlat +crs = 'EPSG:6339'")
ParkingMeters.sf<- st_transform(ParkingMeters.sf,"EPSG:6339")

#add neighborhoods

ParkingMeters.sf <- st_join(ParkingMeters.sf,neighborhoods, join = st_within)

# select desired variables

ParkingMeters.sf <-
  ParkingMeters.sf%>%
  dplyr::select(StationID,STREET_BLOCK,POST_ID,SESSION_START_DT,SESSION_END_DT,GROSS_PAID_AMT,interval60,interval15,week,dotw,ParkTime,
                ParkTimeMin2,GROSS_PAID_AMT2,ParkingRate,Neighborhoods,COLLECTION_SUBROUTE,LONGITUDE2,LATITUDE2,name)

# Adding census data to parking data

ParkingMeter_census <- st_join(ParkingMeters.sf %>% 
                           filter(is.na(LONGITUDE2) == FALSE &
                                    is.na(LATITUDE2) == FALSE)  %>%
                           st_as_sf(., coords = c("LONGITUDE2", "LATITUDE2"), crs = 'EPSG:6339'),
                         sfTracts %>%
                           st_transform(crs='EPSG:6339'),
                         join=st_intersects,
                         left = TRUE) %>%
  rename(Origin.Tract = GEOID) %>%
  mutate(start_station_longitude = unlist(map(geometry, 1)),
         start_station_latitude = unlist(map(geometry, 2)))%>%
  as.data.frame() %>%
  dplyr::select(-geometry) 

Next, once the paring meter data is set, add weather data. The following function downloads weather data from San Francisco Airport (SFO) between April 30th and May 4th 2018. Figure 1.1 shows that the week we selected to examine has little weather change. We purposefully chose a week with even weather to maximize consistency in parking trends between our training days and test days.

#weather
weather.Panel <- 
  riem_measures(station = "SFO", date_start = "2018-04-30", date_end = "2018-05-04") %>%
  dplyr::select(valid, tmpf, p01i, sknt)%>%
  replace(is.na(.), 0) %>%
  mutate(interval60 = ymd_h(substr(valid,1,13))) %>%
  mutate(week = week(interval60),
         dotw = wday(interval60, label=TRUE)) %>%
  group_by(interval60) %>%
  summarize(Temperature = max(tmpf),
            Precipitation = sum(p01i),
            Wind_Speed = max(sknt)) %>%
  mutate(Temperature = ifelse(Temperature == 0, 42, Temperature))

grid.arrange(
  ggplot(weather.Panel, aes(interval60,Precipitation)) + geom_line(color = "#91ccc3", size = 1) + 
    labs(title="Precipitation", x="Hour", y="Precipitation") + plotTheme(),
  ggplot(weather.Panel, aes(interval60,Wind_Speed)) + geom_line(color = "#91ccc3", size = 1) + 
    labs(title="Wind Speed", x="Hour", y="Wind Speed") + plotTheme(),
  ggplot(weather.Panel, aes(interval60,Temperature)) + geom_line(color = "#91ccc3", size = 1) + 
    labs(title="Temperature", x="Hour", y="Temperature",caption = "Figure 1") + plotTheme(),
  top="Weather Data - San Francisco SFO - April-May, 2018")

Data Exploration

We begin by examining the time and frequency of our data. Figure 2 illustrates the number of parking sessions per hour during the week. While the consistency in sessions per day allows the model to predict parking sessions on Thursday and Friday based on the training data from Monday - Wednesday, more accuracy could potentially be obtained by predicting a week’s worth of data based on the previous week. However, our limited processing power did not permit this.

#plot parking sessions per hour

ggplot(ParkingMeter_census %>%
         group_by(interval60) %>%
         tally())+
  geom_line(aes(x = interval60, y = n), color = "#91ccc3", size = 1.25)+
  labs(title="Parking Sessions per Hour,\n San Francisco, May, 2018",
       x="Date", 
       y="Number of Sessions", caption = "Figure 2")+
  plotTheme()

Figure 3 through 5 further illustrate the time and frequency of the data and confirm the consistent pattern between days of the week and the single peak over the course of each day.

# Mean number of hourly parking sessions

ParkingMeter_census %>%
  mutate(time_of_day = case_when(hour(interval60) < 7 | hour(interval60) > 18 ~ "Overnight",
                                 hour(interval60) >= 7 & hour(interval60) < 10 ~ "AM Rush",
                                 hour(interval60) >= 10 & hour(interval60) < 15 ~ "Mid-Day",
                                 hour(interval60) >= 15 & hour(interval60) <= 18 ~ "PM Rush"))%>%
  group_by(interval60, StationID, time_of_day) %>%
  tally()%>%
  group_by(StationID, time_of_day)%>%
  summarize(mean_trips = mean(n))%>%
  ggplot()+
  geom_histogram(aes(mean_trips), binwidth = 1, fill = "#91ccc3")+
  labs(title="Mean Number of Hourly Parking Sessions per Block,\nSan Francisco, April - May, 2018",
       x="Number of Sessions", 
       y="Frequency",caption = "Figure 3")+
  facet_wrap(~time_of_day)+
  plotTheme()

# Parking sessions per hour by block

ggplot(ParkingMeter_census %>%
         group_by(interval60, StationID) %>%
         tally())+
  geom_histogram(aes(n), binwidth = 2, fill = "#91ccc3")+
  labs(title="Parking Sessions per Hour by Block,\nSan Francisco, April-May, 2018",
       x="Number of Blocks", 
       y="Session Counts",caption = "Figure 4")+
  plotTheme()

# Number of parking sessions by hour in a week

ggplot(ParkingMeter_census %>% mutate(hour = hour(SESSION_END_DT)))+
  geom_freqpoly(aes(hour, color = dotw), binwidth = 1, size = 1)+
  labs(title="Parking Sessions by Day of the Week,\nSan Francisco, April-May, 2018",
       x="Hour", 
       y="Session Counts",caption = "Figure 5")+
  scale_color_manual(values=c("#97CCC4","#9AE5B3","#B0EF9F","#D0F588","#F9F871"))+
  plotTheme()

# Prepare data for plot

ParkingMeter_census_plot <- ParkingMeter_census
ParkingMeter_census_plot <- ParkingMeter_census_plot[!(ParkingMeter_census_plot$LATITUDE == ""), ]
ParkingMeter_census_plot <- ParkingMeter_census_plot[!(ParkingMeter_census_plot$LONGITUDE == ""), ]
ParkingMeter_census_plot <- st_as_sf(x = ParkingMeter_census_plot, coords = c("LONGITUDE2","LATITUDE2"),crs = "+proj=longlat +crs = 'EPSG:6339'")
ParkingMeter_census_plot<- st_transform(ParkingMeter_census_plot,"EPSG:6339")

Figure 6 shows the spatial pattern and frequency of parking sessions at different times of the day and reveals that parking sessions are most clustered during mid-day and during the PM rush.

# Plot parking by block

ggplot()+
  geom_sf(data = sfTracts %>%
            st_transform(crs='EPSG:6339'), colour = '#efefef')+
  geom_sf(data = ParkingMeter_census_plot %>% 
               mutate(hour = hour(SESSION_END_DT),
                      time_of_day = case_when(hour(interval60) < 7 | hour(interval60) > 18 ~ "Overnight",
                                              hour(interval60) >= 7 & hour(interval60) < 10 ~ "AM Rush",
                                              hour(interval60) >= 10 & hour(interval60) < 15 ~ "Mid-Day",
                                              hour(interval60) >= 15 & hour(interval60) <= 18 ~ "PM Rush"))%>%
               group_by(StationID, time_of_day) %>%
               tally(),
             aes( color = n), 
             fill = "transparent",  size = 1)+
  scale_colour_viridis(direction = -1,
                       discrete = FALSE, option = "D")+
  facet_grid(~ time_of_day)+
  labs(title="Parking Sessions per Hour by Block,\nSan Francisco, April-May, 2018",caption = "Figure 6")+
  mapTheme()

The animation in Figure 7 further reveals the spatial pattern of parking sessions over the course of one day. The animation confirms, similarly to our previous data exploration, that parking sessions become most dense at a singular peak during midday.

library(gganimate)
library(gifski)

monday <-
  filter(ParkingMeter_census ,dotw == "Mon")

monday.panel <-
  expand.grid(
    interval15 = unique(monday$interval15),
    Pickup.Census.Tract = unique(ParkingMeter_census$StationID))

ride.animation.data <-
  mutate(monday, Trip_Counter = 1) %>%
  select(interval15, StationID, LONGITUDE2, LATITUDE2, Trip_Counter) %>%
  group_by(interval15, StationID, LONGITUDE2, LATITUDE2) %>%
  summarize(Trip_Count = sum(Trip_Counter, na.rm=T)) %>% 
  ungroup() %>% 
  mutate(Trips = case_when(Trip_Count == 0 ~ "0 trips",
                           Trip_Count > 0 & Trip_Count <= 2 ~ "0-2 trips",
                           Trip_Count > 2 & Trip_Count <= 5 ~ "2-5 trips",
                           Trip_Count > 5 & Trip_Count <= 10 ~ "5-10 trips",
                           Trip_Count > 10 & Trip_Count <= 15 ~ "10-15 trips",
                           Trip_Count > 15 ~ "15+ trips")) %>%
  mutate(Trips  = fct_relevel(Trips, "0 trips","0-2 trips","2-5 trips",
                              "5-10 trips","10-15 trips","15+ trips"))
ride.animation.data <- ride.animation.data[!(ride.animation.data$LATITUDE2 == ""), ]
ride.animation.data <- ride.animation.data[!(ride.animation.data$LONGITUDE2 == ""), ]
ride.animation.data <- st_as_sf(x = ride.animation.data, coords = c("LONGITUDE2","LATITUDE2"),crs = "+proj=longlat +crs = 'EPSG:6339'")
ride.animation.data<- st_transform(ride.animation.data,"EPSG:6339")

rideshare_animation <-
  ggplot()+
  geom_sf(data = sfTracts %>%
            st_transform(crs='EPSG:6339'), colour = '#efefef')+
  geom_sf(data = ride.animation.data, 
             aes( fill = Trips), size = 0.5, alpha = 1.5) +
  scale_colour_manual(values = palette5) +
  labs(title = "Parking Sessions by Block for April 30th, 2018",
       subtitle = "15 minute intervals: {current_frame}",caption = "Figure 7") +
  transition_manual(interval15) +
  mapTheme()

animate(rideshare_animation, duration=20, renderer = gifski_renderer())

Feature Engineering

After our initial data exploration, we begin fine-tuning our predictive model. Before we begin adding features, we first create a time-series panel, which creates a unique combination of each block to the hour and day. This creates a “panel” dataset where each time period is represented by a row, whether or not a parking session took place at that time. If a block did not have any parking sessions on it in a given hour, it receives a zero in that spot in the panel. If we had more processing power, this process could be done at fifteen or even ten minute intervals.

The panel is created by making an empty data frame study.panel, that has each unique space/time observation. This is done using the expand.grid function and unique. We then join the block, tract, and latitude and longitude and create the full panel by summarizing counts by block for each time interval. We keep census info and the latitude and longitude information to join with our engineered features.

# create study panel

study.panel <- 
  expand.grid(interval60 = unique(ParkingMeters$interval60), 
              StationID = unique(ParkingMeters$StationID))
study.panel <-
left_join(study.panel, ParkingMeter_census %>%
            dplyr::select(StationID,  Origin.Tract, name, LONGITUDE2, LATITUDE2 )%>%
            distinct() %>%
            group_by(StationID) %>%
            slice(1))

#create ride.panel

ride.panel <- 
  ParkingMeter_census %>%
  mutate(Trip_Counter = 1) 

ride.panel <- 
  ride.panel%>%
  right_join(study.panel)

ride.panel <-
  ride.panel%>%
  group_by(interval60, StationID, Origin.Tract,name, LONGITUDE2, LATITUDE2) %>%
  summarize(Trip_Count = sum(Trip_Counter, na.rm=T)) %>%
  left_join(weather.Panel) %>%
  ungroup() %>%
  filter(is.na(StationID) == FALSE) %>%
  mutate(week = week(interval60),
         dotw = wday(interval60, label = TRUE)) %>%
  filter(is.na(Origin.Tract) == FALSE)

ride.panel <- 
  left_join(ride.panel, sfCensus %>%
              as.data.frame() %>%
              dplyr::select(-geometry), by = c("Origin.Tract" = "GEOID"))

As shown in the exploratory analysis, the time of day makes a difference in the number of parking sessions per block. Here we create time lag variables which will give us additional information about the number of ending sessions during a given time period.

Table 1 shows that there is are strong correlations between the number of sessions and each of the time lags. The strongest positive correlation was with the one hour time lag (r=0.91). This makes sense, because as shown in the exploratory analysis, the number of parking sessions ending follows a pronounced pattern every day. The number of parking sessions ending for a specific hour is similar to that of the next hour and has strong positive correlation with the one day lag (r=0.75) the next day as well. Thus, the lag features were included in the predictive model.

#add time lags feature

library(kableExtra)
ride.panel <- 
  ride.panel %>% 
  arrange(StationID, interval60) %>% 
  mutate(lagHour = dplyr::lag(Trip_Count,1),
         lag2Hours = dplyr::lag(Trip_Count,2),
         lag3Hours = dplyr::lag(Trip_Count,3),
         lag4Hours = dplyr::lag(Trip_Count,4),
         lag12Hours = dplyr::lag(Trip_Count,12),
         lag1day = dplyr::lag(Trip_Count,24)) %>%
  mutate(day = yday(interval60))

ride.panel <-
  ride.panel %>%
  mutate(X = LONGITUDE2, Y = LATITUDE2 )%>%
  st_as_sf(coords = c("X", "Y"), crs = 'EPSG:6339', agr = "constant") %>%
  st_transform('EPSG:6339')

first_column <- c("Time Lag","Lag (1 hour)", "Lag (2 hours)","Lag (3 hours)","Lag (4 hours)","Lag (12 Hours)","Lag (1 day)")
second_column <- c("Correlation","0.91","0.74","0.53","0.32","-0.73","0.75")

# add time of day feature

ride.panel<-
  ride.panel%>%
  mutate(
       time_of_day = case_when(hour(interval60) < 7 | hour(interval60) > 18 ~ "Overnight",
                               hour(interval60) >= 7 & hour(interval60) < 10 ~ "AM Rush",
                               hour(interval60) >= 10 & hour(interval60) < 15 ~ "Mid-Day",
                               hour(interval60) >= 15 & hour(interval60) <= 18 ~ "PM Rush"))

# create lag correlation table

lag_correlation <- data.frame(first_column, second_column)

kable(lag_correlation,format = "html", col.names = c('', '')) %>% 
  kable_styling() %>%
  footnote(general_title = "\n",
           general = "Table 1")
Time Lag Correlation
Lag (1 hour) 0.91
Lag (2 hours) 0.74
Lag (3 hours) 0.53
Lag (4 hours) 0.32
Lag (12 Hours) -0.73
Lag (1 day) 0.75

Table 1

The demand for parking is likely dependent on nearby amenities and accessibility to transit. We added two exposure features to our predictive model - proximity to bars and restaurants and distance to transit - to help account for this. We also included the neighborhood in the predictive model since the number of parking sessions seemed to vary over space.

# add distance to bars/restaurants

sfbox <- 
  get_acs(geography = "tract", 
          variables = c("B01003_001", "B19013_001", 
                        "B02001_002", "B08013_001",
                        "B08012_001", "B08301_001", 
                        "B08301_010", "B01002_001"), 
          year = 2018, 
          state = "CA", 
          geometry = TRUE, 
          county=c("San Francisco"),
          output = "wide") %>%
  rename(Total_Pop =  B01003_001E,
         Med_Inc = B19013_001E,
         Med_Age = B01002_001E,
         White_Pop = B02001_002E,
         Travel_Time = B08013_001E,
         Num_Commuters = B08012_001E,
         Means_of_Transport = B08301_001E,
         Total_Public_Trans = B08301_010E) %>%
  dplyr::select(Total_Pop, Med_Inc, White_Pop, Travel_Time,
                Means_of_Transport, Total_Public_Trans,
                Med_Age,
                GEOID, geometry) %>%
  mutate(Percent_White = White_Pop / Total_Pop,
         Mean_Commute_Time = Travel_Time / Total_Public_Trans,
         Percent_Taking_Public_Trans = Total_Public_Trans / Means_of_Transport)
sfbox <- 
  sfbox %>%
  as.data.frame() %>%
  distinct(GEOID, .keep_all = TRUE) %>%
  dplyr::select(GEOID, geometry) %>% 
  st_sf

`%nin%` = Negate(`%in%`)
sfbox <- subset(sfbox, GEOID %nin% c("06075980401","06075017902","06075017902","06075017902"))

sfbox <- st_union(sfbox)
sfbox <-sfbox%>%
  st_transform(crs=4326)
xmin = st_bbox(sfbox)[[1]]
ymin = st_bbox(sfbox)[[2]]
xmax = st_bbox(sfbox)[[3]]  
ymax = st_bbox(sfbox)[[4]]

bars <- opq(bbox = c(xmin, ymin, xmax, ymax)) %>% 
  add_osm_feature(key = 'amenity', value = c("bar", "pub", "restaurant")) %>%
  osmdata_sf()

bars <- 
  bars$osm_points %>%
  .[sfbox,]

bars <- st_transform(bars,'EPSG:6339')
bars <- st_set_crs(bars,'EPSG:6339')
st_c <- st_coordinates

ride.panel <-
  ride.panel %>% 
  mutate(
    bar_dist = nn_function(st_coordinates(ride.panel), st_coordinates(bars), 1))

#add distance to transit feature

transit_stops <-st_read("https://opendata.arcgis.com/datasets/561dc5b42fa9451b95faf615a3054260_0.geojson") %>%
  st_transform(st_crs('EPSG:6339'))
transit_stops <- st_transform(transit_stops,'EPSG:6339')

ride.panel <-
  ride.panel %>% 
  mutate(
    station_dist = nn_function(st_coordinates(ride.panel), st_coordinates(transit_stops), 1))

Testing the Model

We created six linear models using the lm function. The first models included only temporal information, but the more advanced ones contained all of the time lag information and other exposure features added in the previous section. We evaluated the predictive power of each model by dividing the dataset into a training set and a test set that allowed us to test how our model might perform on a real dataset. As mentioned previously, Monday through Wednesday were grouped into the training set, and Thursday and Friday were grouped into the test set. To test the accuracy of each model we used k-fold cross validation to determine the lowest mean absolute error (MAE). After selecting a model, we then analyzed how good the model was at predicting across neighborhoods and looked for clustering of errors.

For our initial analysis of the six models, we first created a function called model_pred which we fitted onto each data frame in our nested structure. As the output below and Figures 7 and 8 show, the MAE decreased as more features and temporal features were added to the regression. The model with all the features more closely resembles the temporal pattern of ending parking sessions than the other models.

# create training and test data

ride.Train <- subset(ride.panel, dotw %in% c("Mon","Tue","Wed"))
ride.Test <- subset(ride.panel, dotw %in% c("Thu","Fri"))

# test 5 different regressions

reg1 <- lm(Trip_Count ~  hour(interval60),  data=ride.Train)

reg2 <- lm(Trip_Count ~  StationID,  data=ride.Train)

reg3 <- lm(Trip_Count ~  StationID + hour(interval60), 
           data=ride.Train)

reg4 <- lm(Trip_Count ~  StationID +  hour(interval60) +
             lagHour + lag2Hours + lag3Hours + lag12Hours + lag1day, 
           data=ride.Train)
reg5 <- lm(Trip_Count ~  StationID +  hour(interval60) +
             lagHour + lag2Hours + lag3Hours + lag12Hours + lag1day +bar_dist +station_dist, 
           data=ride.Train)
reg6 <-lm(Trip_Count ~  StationID +  hour(interval60) +
            lagHour + lag2Hours + lag3Hours + lag12Hours + lag1day +bar_dist +station_dist+name, 
          data=ride.Train)

# Predict on test data

ride.Test.weekNest <- 
  ride.Test %>%
  nest(-week) 

model_pred <- function(dat, fit){
  pred <- predict(fit, newdata = dat)}

week_predictions <- 
  ride.Test.weekNest %>% 
  mutate(ATime_FE = map(.x = data, fit = reg1, .f = model_pred),
         BSpace_FE = map(.x = data, fit = reg2, .f = model_pred),
         CTime_Space_FE = map(.x = data, fit = reg3, .f = model_pred),
         DTime_Space_FE_timeLags = map(.x = data, fit = reg4, .f = model_pred),
         ETime_Space_FE_timeLags_Features = map(.x = data, fit = reg5, .f = model_pred),
         All_features=map(.x = data, fit = reg6, .f = model_pred)) %>% 
  gather(Regression, Prediction, -data, -week) %>%
  mutate(Observed = map(data, pull, Trip_Count),
         Absolute_Error = map2(Observed, Prediction,  ~ abs(.x - .y)),
         MAE = map_dbl(Absolute_Error, mean, na.rm = TRUE),
         sd_AE = map_dbl(Absolute_Error, sd, na.rm = TRUE))

week_predictions
## # A tibble: 6 x 8
##    week data      Regression     Prediction Observed  Absolute_Error   MAE sd_AE
##   <dbl> <list>    <chr>          <list>     <list>    <list>         <dbl> <dbl>
## 1    18 <tibble ~ ATime_FE       <dbl [64,~ <dbl [64~ <dbl [64,152]>  3.72  4.55
## 2    18 <tibble ~ BSpace_FE      <dbl [64,~ <dbl [64~ <dbl [64,152]>  3.35  3.94
## 3    18 <tibble ~ CTime_Space_FE <dbl [64,~ <dbl [64~ <dbl [64,152]>  3.31  3.91
## 4    18 <tibble ~ DTime_Space_F~ <dbl [64,~ <dbl [64~ <dbl [64,152]>  1.77  2.69
## 5    18 <tibble ~ ETime_Space_F~ <dbl [64,~ <dbl [64~ <dbl [64,152]>  1.77  2.69
## 6    18 <tibble ~ All_features   <dbl [64,~ <dbl [64~ <dbl [64,152]>  1.77  2.69
#preserve week predictions 
week_predictions2<-week_predictions
#preserve week predictions for plotting

week_predictions_plot <-week_predictions %>% 
  mutate(interval60 = map(data, pull, interval60),
         StationID = map(data, pull, StationID), 
         LATITUDE2 = map(data, pull, LATITUDE2), 
         LONGITUDE2 = map(data, pull, LONGITUDE2),
         dotw = map(data, pull, dotw) ) %>%
  dplyr::select(interval60, StationID, LATITUDE2, 
         LONGITUDE2, Observed, Prediction, Regression,
         dotw) %>%
  unnest() %>%
  filter(Regression == "All_features")%>%
  mutate(
    time_of_day = case_when(hour(interval60) < 7 | hour(interval60) > 18 ~ "Overnight",
                            hour(interval60) >= 7 & hour(interval60) < 10 ~ "AM Rush",
                            hour(interval60) >= 10 & hour(interval60) < 15 ~ "Mid-Day",
                            hour(interval60) >= 15 & hour(interval60) <= 18 ~ "PM Rush")) %>%
  group_by(StationID, time_of_day, LONGITUDE2, LATITUDE2) %>%
  summarize(MAE = mean(abs(Observed-Prediction), na.rm = TRUE))
week_predictions_plot <- week_predictions_plot[!(week_predictions_plot$LATITUDE2 == ""), ]
week_predictions_plot <- week_predictions_plot[!(week_predictions_plot$LONGITUDE2 == ""), ]
week_predictions_plot <- st_as_sf(x = week_predictions_plot, coords = c("LONGITUDE2","LATITUDE2"),crs = "+proj=longlat +crs = 'EPSG:6339'")
week_predictions_plot<- st_transform(week_predictions_plot,"EPSG:6339")

#week predictions plot neighborhood

week_predictions_plot_neighbor2 <-week_predictions %>% 
  mutate(interval60 = map(data, pull, interval60),
         name = map(data,pull,name),
         StationID = map(data, pull, StationID), 
         LATITUDE2 = map(data, pull, LATITUDE2), 
         LONGITUDE2 = map(data, pull, LONGITUDE2),
         dotw = map(data, pull, dotw) ) %>%
  dplyr::select(interval60, StationID, LATITUDE2, 
                LONGITUDE2, Observed, Prediction, Regression,
                dotw,name) %>%
  unnest() %>%
  filter(Regression == "All_features")%>%
  mutate(
    time_of_day = case_when(hour(interval60) < 7 | hour(interval60) > 18 ~ "Overnight",
                            hour(interval60) >= 7 & hour(interval60) < 10 ~ "AM Rush",
                            hour(interval60) >= 10 & hour(interval60) < 15 ~ "Mid-Day",
                            hour(interval60) >= 15 & hour(interval60) <= 18 ~ "PM Rush")) %>%
  group_by(StationID, time_of_day, LONGITUDE2, LATITUDE2,name) %>%
  summarize(MAE = mean(abs(Observed-Prediction), na.rm = TRUE))
week_predictions_plot_neighbor2 <- week_predictions_plot_neighbor2[!(week_predictions_plot_neighbor2$LATITUDE2 == ""), ]
week_predictions_plot_neighbor2 <- week_predictions_plot_neighbor2[!(week_predictions_plot_neighbor2$LONGITUDE2 == ""), ]
week_predictions_plot_neighbor2 <- st_as_sf(x = week_predictions_plot_neighbor2, coords = c("LONGITUDE2","LATITUDE2"),crs = "+proj=longlat +crs = 'EPSG:6339'")
week_predictions_plot_neighbor2<- st_transform(week_predictions_plot_neighbor2,"EPSG:6339")
# predictions

week_predictions <- 
  ride.Test.weekNest %>% 
  mutate(ATime_FE = map(.x = data, fit = reg1, .f = model_pred),
         BSpace_FE = map(.x = data, fit = reg2, .f = model_pred),
         CTime_Space_FE = map(.x = data, fit = reg3, .f = model_pred),
         DTime_Space_FE_timeLags = map(.x = data, fit = reg4, .f = model_pred),
         ETime_Space_FE_timeLags_Features = map(.x = data, fit = reg5, .f = model_pred),
         F_All_features=map(.x = data, fit = reg6, .f = model_pred)) %>% 
  gather(Regression, Prediction, -data, -week) %>%
  mutate(Observed = map(data, pull, Trip_Count),
         Absolute_Error = map2(Observed, Prediction,  ~ abs(.x - .y)),
         MAE = map_dbl(Absolute_Error, mean, na.rm = TRUE),
         sd_AE = map_dbl(Absolute_Error, sd, na.rm = TRUE))

# Bar plot comparing predictions

week_predictions %>%
  dplyr::select(week, Regression, MAE) %>%
  gather(Variable, MAE, -Regression, -week) %>%
  ggplot(aes(week, MAE)) + 
  geom_bar(aes(fill = Regression), color = "#de564d", position = "dodge", stat="identity") +
  scale_fill_manual(values = palette6) +
  labs(title = "Mean Absolute Error by Model Specification",caption="Figure 7") +
  plotTheme()

# Graph of predictions

week_predictions %>% 
  mutate(interval60 = map(data, pull, interval60),
         StationID = map(data, pull, StationID)) %>%
  dplyr::select(interval60, StationID, Observed, Prediction, Regression) %>%
  unnest() %>%
  gather(Variable, Value, -Regression, -interval60, -StationID) %>%
  group_by(Regression, Variable, interval60) %>%
  summarize(Value = sum(Value)) %>%
  ggplot(aes(interval60, Value, colour=Variable)) + 
  geom_line(size = 1.1,) + 
  facet_wrap(~Regression, ncol=1) +
  labs(title = "Predicted/Observed Parking Session Endings Time Series", subtitle = "San Francisco",  x = "Hour", y= "Parking Sessions",caption="Figure 8") +
  scale_color_manual(values=c("#97ccc4","#de564d"))+
  plotTheme()

Cross validation is one method of testing the generalizability of a model. We took a sample of the data and ran a 100 k-fold cross validation test on it using our sixth model which included all the features. Even with all the features, our MAE remained above 1. This indicates that our model is not as accurate as would be desired. While the stakes for this project are not as high as other possible policy inquiries, a MAE that is this high may be problematic if it were to be implemented. A Denizen app user who goes to a block which is predicted to have many parking sessions ending but finds none may be less inclined to use the app again. As mentioned previously, using a larger dataset with more features would allow for better accuracy and generalizability if this app was implemented.

# Cross validation

library(caret)

netsample <- sample_n(ride.panel, 100000)%>%
  na.omit()

fitControl <- trainControl(method = "cv", 
                           number = 100,
                           savePredictions = TRUE)

set.seed(1000)

# for k-folds CV

reg.cv <-  
  train(Trip_Count ~ StationID +  hour(interval60) +
          lagHour + lag2Hours + lag3Hours + lag12Hours + lag1day +bar_dist +station_dist+name+time_of_day, 
        data = netsample,  
        method = "lm",  
        trControl = fitControl,  
        na.action = na.pass)

reg.cv
## Linear Regression 
## 
## 80363 samples
##    12 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (100 fold) 
## Summary of sample sizes: 79559, 79559, 79559, 79559, 79559, 79560, ... 
## Resampling results:
## 
##   RMSE      Rsquared   MAE     
##   2.890837  0.7431762  1.762118
## 
## Tuning parameter 'intercept' was held constant at a value of TRUE

As Figures 9 and 10 show, the MAE in some areas of the city is higher than in others. In the plot below, higher errors are represented by purple dots, while lower errors are represented by yellow dots. The map seems to suggests that our errors have spatial autocorrelation, which is to say that the instances of high errors seem to be clustered. For example, the MAE is especially high in the Miraloma Park and Merced Manor neighbrhoods. This indicates that there are certain characteristics of this area that make the model not particularly accurate when predicting parking behavior there. Specific exposure features to that area might help improve the results.

# map MAE by block

ggplot()+
  geom_sf(data = sfTracts %>%
            st_transform(crs='EPSG:6339'), colour = '#efefef')+
  geom_sf(data=week_predictions_plot,aes( color = MAE), 
             fill = "transparent", alpha = 0.4)+
  scale_colour_viridis(direction = -1,
                       discrete = FALSE, option = "D")+
  labs(title="Mean Absolute Error by Block, Test Set",subtitle="San Francisco",caption="Figure 9")+
  mapTheme()

#map MAE by neighborhood
week_predictions_plot_neighbor2<-
  st_drop_geometry(week_predictions_plot_neighbor2)%>%
  group_by(name)%>%
  summarize(mean.MAE= mean(MAE,na.rm=T))
neighborhood_plot <-merge(neighborhoods,week_predictions_plot_neighbor2)


ggplot() +
  geom_sf(data = sfTracts %>%
            st_transform(crs='EPSG:6339'), colour = '#efefef')+
  geom_sf(data=neighborhood_plot, aes(fill = mean.MAE)) +
scale_fill_viridis_c(direction=-1)+
  labs(title="Mean MAE by Neighborhood",
  subtitle="San Francisco", 
       caption= "Figure 10") +
  mapTheme()

Our model also does a better job at predicting parking sessions ending at certain times of day than others. Plotting the number of parking sessions ending predicted by the model against actual observed sessions ending by time of day allowed us to diagnose when the model seemed to be predicting accurately and when it was not. In Figure 11, our predicted line (red) falls below the perfect prediction line (black), suggesting we are consistently underpredicting the number of parking sessions ending. This is especially evident during the AM rush.

## Scatterplot of Error by time of day

week_predictions2 %>% 
  mutate(interval60 = map(data, pull, interval60),
         start_station_id = map(data, pull, StationID), 
         start_station_latitude = map(data, pull, LATITUDE2), 
         start_station_longitude = map(data, pull, LONGITUDE2),
         dotw = map(data, pull, dotw)) %>%
  select(interval60, start_station_id, start_station_longitude, 
         start_station_latitude, Observed, Prediction, Regression,
         dotw) %>%
  unnest() %>%
  filter(Regression == "All_features")%>%
  mutate(weekend = ifelse(dotw %in% c("Sun", "Sat"), "Weekend", "Weekday"),
         time_of_day = case_when(hour(interval60) < 7 | hour(interval60) > 18 ~ "Overnight",
                                 hour(interval60) >= 7 & hour(interval60) < 10 ~ "AM Rush",
                                 hour(interval60) >= 10 & hour(interval60) < 15 ~ "Mid-Day",
                                 hour(interval60) >= 15 & hour(interval60) <= 18 ~ "PM Rush"))%>%
  ggplot()+
  geom_point(aes(x= Observed, y = Prediction))+
  geom_smooth(aes(x= Observed, y= Prediction), method = "lm", se = FALSE, color = "#de564d")+
  geom_abline(slope = 1, intercept = 0)+
  facet_grid(~time_of_day)+
  labs(title="Observed vs. Predicted",
       x="Observed Sessions Ending", 
       y="Predicted Sessions Ending",caption="Figure 11")+
  plotTheme()

App Interface

After fine-tuning the model, we attempted to viualize how the model would be integrated into the app. Using ggmap we used our model to create a heatmap which not only took in the density of blocks but also the number of predicted ending parking sessions at each block to identify hotspots. The following image shows the heatmap of predicted parking sessions ending for 1pm on Thursday. Using ggmap requires that you register a Google API. The API cannot be shared publicly, so in the following code, one would place their API in place of “your key.” We performed the following code for several times of day.

##preparing data for heatmap

#ride.test2 <-
#  ride.Test %>%
#  mutate(Sessions.Predict = predict(reg.cv, ride.Test))

#ride.test3 <-
# ride.test2%>%
#  dplyr::select(LONGITUDE2,LATITUDE2,Sessions.Predict,dotw,interval60)

#ride.test3$hour <- hour(ride.test3$interval60)
#ride.test3.1 <- ride.test3%>%filter(hour==13)
#ride.test3.1 <-subset(ride.test3.1, dotw %in% "Thu")

##heat map for 1pm thursday

## import data and libaries 
library(ggplot2)
library(ggmap)
library(data.table)
library(devtools)

##register google key

#register_google("your key", write = FALSE)

## google map of san francisco

#map <- get_map(location = "san francisco", zoom = 15)
#ride.test3.1<-setDT(ride.test3.1)


## generate bins for the x, y coordinates

#xbreaks <- seq(floor(min(as.numeric(ride.test3.1$LATITUDE2))), #ceiling(max(as.numeric(ride.test3.1$LATITUDE2))), by = 0.001)
#ybreaks <- seq(floor(min(as.numeric(ride.test3.1$LONGITUDE2))), #ceiling(max(as.numeric(ride.test3.1$LONGITUDE2))), by = 0.001)

## allocate the data points into the bins

#ride.test3.1$latbin <- xbreaks[cut(as.numeric(ride.test3.1$LATITUDE2), breaks = xbreaks, #labels=F)]
#ride.test3.1$longbin <- ybreaks[cut(as.numeric(ride.test3.1$LONGITUDE2), breaks = ybreaks, #labels=F)]

## Summarise the data for each bin

#datamat <- ride.test3.1[, list(Sessions.Predict = mean(Sessions.Predict)), 
#                by = c("latbin", "longbin")]

## Merge the summarised data with all possible x, y coordinate combinations to get 
# a value for every bin

#datamat <- merge(setDT(expand.grid(latbin = xbreaks, longbin = ybreaks)), datamat, 
#                 by = c("latbin", "longbin"), all.x = TRUE, all.y = FALSE)

## Fill up the empty bins 0 to smooth the contour plot

#datamat[is.na(Sessions.Predict), ]$Sessions.Predict <- 0


## Plot the contours

#ggmap(map, extent = "device") +
#  stat_contour(data = datamat, aes(x = longbin, y = latbin, z = Sessions.Predict, 
#                                   fill = ..level.., alpha = ..level..), geom = 'polygon', #binwidth =1) +
#  scale_fill_gradient(name = "Sessions", low = "green", high = "red") +
#  guides(alpha = FALSE)

The following image shows what this interface would like for an app user. Using the scrolling bar, the user can view the predicted hotspots and plan their day accordingly.

Conclusion

Our app is likely to raise questions about the feasibility and ethics of allowing ordinary citizens, as opposed to government employees, to administer parking tickets. Before a strategy like this could be implemented, there would need to be a significant amount of planning and coordination with local governments. One of the likely areas of scrutiny would be safeguards against a situation where a car is flagged improperly by an app user for illegally parking when in reality a ticket is not deserved. The app would need to be built with a verification system that could cross check the date, time, and location of a submitted photo against local parking regulations to validate each instance of alleged illegal parking. Users who consistently submitted incorrect parking ticket submissions could also lose their privileges in the app. Another feature that would likely be of great interest to local governments is the feasibility, optics, and perhaps legality of awarding a percentage of the dollar amount of parking tickets back to a third party app provider, and in turn to app users. A financial compensation structure like this is not common and would likely require significant negotiation with government partners, whose approval would be necessary for implementation. Even if an app did promise to administer parking tickets at lower cost than the status quo of hiring parking enforcement officers, such an approach could very likely cause public uproar and political pressure against implementation. It is hard to predict how eager cities and their residents would be to embrace the approach we have proposed here, but we do think that there is an opportunity - whether with parking tickets or otherwise - to crowdsource certain government functions to the benefit of city governments and residents. In the same way that private sector companies like Uber and TaskRabbit have found ways to harness the power of freelance workers to great financial success, the public sector too should consider new and creative ways to deliver public services that new technologies have made possible.

Sources:

https://www.carrentals.com/blog/parking-tickets-cost-americans/#:~:text=What%20Parking%20Tickets%20Cost%20Drivers,in%20annual%20parking%20ticket%20revenue

https://priceonomics.com/san-francisco-parking-meters-a-s130mm-industry/

https://stories.freepik.com/